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PATENT APPLICATION IN THE U.S. PATENT AND TRADEMARK OFFICE 

for 

SYSTEM AND METHOD FOR INTELLIGENT LOAD DISTRIBUTION 
TO MINIMIZE RESPONSE TIME FOR WEB CONTENT ACCESS 

by 

Kasim Selcuk Candan and Wen-Syan Li 



CROSS-REFERENCE TO RELATED APPLICATIONS 

Embodiments of the present invention claim priority from U.S. Provisional 
Application Serial No. 60/230,564 entitled "Intelligent Load Distribution to Minimize User 
Response Time for Web Content Access," filed August 31, 2000. The content of this 
application is incorporated by reference herein. 



BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The present invention relates, generally, to content delivery systems, and, in 
preferred embodiments, to systems and methods for intelligently distributing content provider 
server loads to minimize user response times for accessing Web content. 

2. Description of the Related Art 

As illustrated in FIG. 1, a conventional content delivery network 10 typically 
includes a plurality of end-users 16 (client browsers) and a plurality of content provider servers 
18 distributed over a large wide area network 14, such as the Internet. The wide area network 
14 may include smaller networks 20, which may roughly correspond to various geographic 
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regions around the world. When, for example, end-user A makes a request 22 for content 
(e.g. html pages and embedded objects) from a content provider server 18, the content 
provider server 18 may then deliver the requested content back to end-user A. However, due 
to delays incurred as the request and content pass through multiple networks 20 and gateways 
5 26, the overall response time seen by end-user A may be quite slow. 

Overall response time is comprised of two elements, network delay and server 
delay. Network delay is the delay incurred as requests and content pass through various 
networks and gateways at the network boundaries, as described above. Server delay is the 
O delay in processing once the server actually receives the request. There are often trade-offs 
3ft) between these two delay elements. 

Mirror servers have been used to improve the performance of the Web as 
^ observed by the end-users. (It should be understood that mirror servers 12, as defined herein, 
|I may also include proxy servers and cache.) As illustrated in FIG. 2, in a conventional content 
J\ delivery system 10 employing mirror servers 12, content from a content provider server 18 is 
flS copied into one or more of the mirror servers 12. Thereafter, for example, if end-user A sends 
y> a request 22 to content provider server 18 for that content, the request may be redirected (see 
J% reference character 24) to a mirror server B that stores a copy of that content. Because the 
mirror server is often located geographically (or logically) close to the requesting end-user, 
network delays, and therefore overall response times, may be reduced. However, the location 
20 and load of the mirror server often plays a large role in determining the actual response times 
seen by the requesting end-user. 

As a result, two approaches have been used to reduce response times, one based 
on location, the other based on load. The location-based approach divides the wide area 
network 14 or Internet into regions, often organized around the multiple networks 20 that form 
25 the Internet. Powerful mirror servers 12 are then located in each region. In the example of 
FIG. 2, mirror server B is located in region C. This approach aims to reduce the network 
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delay observed by the end-users 16 by redirecting content requests to mirror servers 12 located 
geographically (or logically) close to the end-users 16. 

In conventional content delivery systems employing a location-based approach, 
all end-users within a particular region will be redirected to a mirror server in that region. 
5 Such content delivery systems are constrained by regional boundaries, and do not allow an 
end-user to be redirected to a mirror server in another region. Ordinarily, this limitation 
produces fast overall response times, because the network delays incurred in crossing over 
regional boundaries are avoided. However, this limitation may actually lead to higher overall 
p response times if the mirror server becomes overloaded. 

J|P For example, suppose the requests of many end-users 16 in region A have been 

;J redirected to mirror server B, as illustrated in FIG. 2. Although network delays may be 

minimized by such a mapping, if the number of requests exceeds the load capacity for mirror 
11 server B, the server delay of mirror server B may increase dramatically, and overall response 
* 5 times may become very slow. Assume also, for purposes of illustration only, that a 
05 neighboring region D contains mirror server E, which also stores a copy of the requested 
M, content, but has received few requests for content, and thus has minimal server delay. In this 
? » r example, although it would actually reduce the overall average response time for all end-users 
in region C if some of the end-users in region C were redirected to mirror server E in region 
D, the regional limitations of conventional location-based approaches will not allow it. 
20 Conventional load-based approaches, on the other hand, aim to distribute the 

load on all mirror servers evenly to prevent any single mirror server from becoming 
overloaded. Content delivery systems employing a load-based approach do not consider 
regional boundaries. Rather, such systems maintain statistics from actual requests, and attempt 
to balance mirror server loads based on these statistics so that all mirror servers see an 
25 approximately equivalent load. 

Load-based approaches assume negligible network delays, but such an 
assumption is not necessarily true. Ordinarily, load balancing produces fast overall response 
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times, because all of the mirror servers are experiencing a reasonable load, and therefore 
server delays are minimized. However, load balancing may actually lead to high network 
delays and higher overall response times if end-user requests are redirected across regional 
boundaries in order to balance the mirror server loads. It should be noted that location-based 
5 approaches to content delivery systems may also employ load balancing techniques within each 
region. 

Nevertheless, as reported in the literature, both approaches work reasonably 
well when the Web objects stored in the mirror servers are large (such as images and streaming 
Q media), although the overall response times of large objects are extremely sensitive to the 
*|0 network conditions. When the object sizes are smaller ( < about 4kB), as it is in the case of 
: ;j most dynamic content, overall response times are less sensitive to the network delays, unless 
^ the delivery path crosses geographic location barriers. In contrast, however, dynamic content 
U is extremely sensitive to mirror server loads, as the underlying databases or backend systems 
m are generally not very easy to scale up, and can become bottlenecks. 

$5 SUMMARY OF THE DISCLOSURE 

^ Therefore, it is an advantage of embodiments of the present invention to provide 

a system and method for redirecting end-users to mirror servers in the same region as the 
requesting end-user, or other regions, using assignments that minimize the overall response time 
seen by users of the content delivery system. 

20 It is a further advantage of embodiments of the present invention to provide a 

system and method for redirecting end-users to mirror servers using assignments that balance 
the loads of the mirror servers while taking into account load capability. 

It is a further advantage of embodiments of the present invention to provide a 
system and method for redirecting end-users to mirror servers in the same region as the 

25 requesting end-user, or other regions, using assignments that minimize the overall response time 

seen by users of the content delivery system, wherein an increase in resources due to the 
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addition of a new mirror server or the service termination of a customer content provider will 
not cause a load redistribution unless load balancing constraints are violated. 

It is a further advantage of embodiments of the present invention to provide a 
system and method for redirecting end-users to mirror servers in the same region as the 
5 requesting end-user, or other regions, using assignments that minimize the overall response time 
seen by users of the content delivery system, wherein a new customer content provider will be 
added only if the overall response time is maintained below a specified threshold. 

It is a further advantage of embodiments of the present invention to provide a 
p system and method for redirecting end-users to mirror servers in the same region as the 
rtp requesting end-user, or other regions, using assignments that minimize the overall response time 
O seen by users of the content delivery system, wherein changes to the loads or existing 
jL* customers will not change the overall response time so significantly that it exceeds a specified 
-J threshold. 

: These and other advantages are accomplished according to a content delivery 

SD5 system having m servers, S' = {S l9 ...,S m } 9 n active customers, C = {C l9 ... 9 C n } 9 and g 
12 geographic locations, G' = {G l9 ... 9 G g } 9 wherein sdel k is a server delay of server S k9 ndel jk is a 
i? network delay observed by customers in geographic location G, while retrieving content from 
server S k9 p J is a priority value for customer C l9 q is a total load of customer C i9 u tJ is a fraction 
of requests coming to customer C t from region G p a iJtk is a mapping representing a fraction of 
20 requests coming to customer Q from region G ; that have been redirected to server S k9 and s k 
represents a load capacity of server S k . Within such a system, a method for distributing server 
loads includes the steps of representing an average prioritized observed response time as 

En ^ g ^ m 
, i L , i L h _ i a u,k x u u x °i x Pi x ( sde h + ndel U k ) 

AORT = l ~ l J ~ L k - [ 

En 
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and then generating a mapping that assigns requests from customers to a particular server while 
minimizing AORT. A heuristic algorithm is used to generate the mapping, wherein large a iJtk 
values are assigned to small u l} xqx (sdel k +ndel Jk ) values to produce a smaller overall AORT 
value. 

These and other objects, features, and advantages of embodiments of the 
invention will be apparent to those skilled in the art from the following detailed description of 
embodiments of the invention, when read with the drawings and appended claims. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 illustrates a conventional content delivery network in which an end-user 
(client browser) requests content from a content provider server located in another region 
through multiple gateways. 

FIG. 2 illustrates a conventional content delivery network in which end-users 
(client browsers) are redirected to request content from mirror servers located in the same 
region as the end-users. 

FIG. 3 illustrates a content delivery network according to preferred 
embodiments of the present invention in which end-users (client browsers) within a particular 
region are redirected to request content from mirror servers located in the same region as the 
end-users or other regions, using assignments that minimize overall response time. 

FIG. 4 is a graph illustrating the linearity of server load below a certain 

threshold. 

FIG. 5 illustrates an example of the heuristic algorithm for assigning end-users 
to mirror servers according to an embodiment of the present invention. 

FIG. 6 is a timeline illustrating an example of information stored in the TCP 
logs indicating the times at which certain bytes of data have been sent to the client browser, 
and the times at which acknowledgement for certain bytes of data has been received from the 
client browser. 
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FIG. 7 illustrates a graphical representation of how the TCP information of FIG. 
6 can be used to estimate per-byte delays according to embodiments of the present invention. 

FIG. 8 illustrates a graphical representation of how HTTP information can be 
used to estimate to overall response time observed by the end-user according to embodiments 
5 of the present invention. 

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS 

In the following description of preferred embodiments, reference is made to the 
O accompanying drawings which form a part hereof, and in which is shown by way of 
H illustration specific embodiments in which the invention may be practiced. It is to be 
J-^O understood that other embodiments may be utilized and structural changes may be made 
:^ without departing from the scope of the preferred embodiments of the present invention. 
U Conventional location-based or load-based approaches to content delivery 

!\ systems are unduly limited, either by forced adherence to regional boundaries, or a lack of 
C consideration of regional boundaries. In contrast, embodiments of the present invention 
35 consider regional boundaries and the network delay associated with crossing regional 
Sj boundaries, but does not strictly adhere to those boundaries. Rather, embodiments of the 

present invention may intelligently redirect requesting end users to mirror servers across 

regional boundaries where doing so would minimize the overall average user response time. 

Such a system attempts to balance the two elements of user response time, network delay and 
20 server delay. In other words, it may be desirable to distribute end-user requests across 

regional boundaries, if the penalty represented by the additional network delay is less then the 

gain observed by the reduced load on the system. 

System Model 

A dynamic content delivery network improves the performance (overall 
25 response time) observed by clients (end-user browsers) of its customers (companies or 
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individuals that subscribe the content delivery services). An example of a content delivery 
system is CachePortal™, described in pending U.S. Patent Application No. 09/545,805, 
entitled "System and Method for Efficient Content Delivery," filed April 7, 2000, the contents 
of which are incorporated herein by reference. To improve the overall response time of end- 
5 user requests, content delivery systems employ mirror servers that are distributed across the 
Internet. More specifically, the task of any content delivery system is to distribute the load 
generated by the clients of its customers across the mirror servers that it employs. 

However, to efficiently distribute the load, content delivery systems according 
to preferred embodiments of the present invention take into account the different observable 
0|0 characteristics of each customer QgC^IQ,..,^}, which may include, but is not limited 

• their published Web content; 

• the size of their load requirement (in terms of the requests 
I generated by their clients per second); and 

£^5 •the regional distribution of their load requirement (where are 

yj their clients at a given time of the day). 

p In addition, preferred embodiments take into account different selectable 

requirements of each customer, which may include, but is not limited to, the performance 
guarantees that they require (the maximum response time that a client should experience). 

20 It should be understood, however, that these characteristics, as well as the 

network characteristics, can change during the day as the usage patterns of end-users shift with 
time of day and the regional location. Therefore, a static solution (such as an optimal content 
placement strategy) is not sufficient. Instead, embodiments of the present invention 
dynamically adjust the client-to-server assignment for each customer. 

25 A simplified explanation of the assignment process will now be provided. 

Referring to the example system of FIG. 3 for purposes of illustration only, a content delivery 
network 28 includes three geographic regions, a Pacific region 30, a Central region 32, and an 
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Atlantic region 34. Within the Pacific region 30, the total number of requests for a particular 
customer at a particular point in time is represented by block 36. In addition, this example 
system includes three mirror servers, a mirror server 38 located in the Pacific region 30, a 
mirror server 40 located in the Central region 32, and a mirror server 40 located in the 
5 Atlantic region 34. 

As described earlier, if all of the requests 36 are redirected to mirror server 32, 
the overall response time may not be minimized due to excessive loads on mirror server 32. 
Thus, embodiments of the present invention may redirect some of the requests 36 to mirror 
server 40 or 42. However, there may be different network delays associated with each mirror 
jp server. In the present example, if a request 36 is redirected to mirror server 38 (see reference 
J3 character 44), because both the end-user and the mirror server are in the Pacific region, and 
p there are few network gateways to cross, assume for purposes of discussion that the network 
£7 delay is negligible. If a request 36 is redirected to mirror server 40 (see reference character 
; y 46), because the request must pass through a gateway 26, assume for purposes of discussion 
a 15 that the network delay is one unit. If a request 36 is redirected to mirror server 42 (see 
hi reference character 48), because the request must pass through two gateway 26, assume for 

purposes of discussion that the network delay is two units. 
O Because a single end-user cannot affect network traffic and network delays 

substantially, changes to the network delay typically occur gradually over time and are a result 
20 of the combined behavior of many end-users. Embodiments of the present invention take 
advantage of the fact that at any one point in time, network delays are stable, by only 
periodically recomputing user response times. These computed response times are used to 
determine an optimal solution (assignment or mapping) over a given time period, until new 
user response times are recomputed. 
25 As described above, embodiments of the present invention also consider each 

mirror server's load capacity and server delay. Referring again to FIG. 3, because the 
network delay between end-users in the Pacific region 30 and mirror server 38 is negligible, it 
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would initially seem to make sense that all requests 36 from the Pacific region should be 
redirected to mirror server 38. However, as mirror server 36 becomes overloaded with 
requests, its load capacity may be exceeded, and its server delay may start to increase. To 
ensure that the overall end-user response time is minimized, embodiments of the present 
5 invention recomputes response times, performs a heuristic algorithm (discussed later) to 

generate an assignment or mapping, and redirects some of the requests to mirror server 40 (see 
reference character 46) and mirror server 48 (see reference character 48) in accordance with 
the mapping. In the example of FIG. 3, 50% of the requests are redirected to mirror server 
^ 38, 30% of the requests are redirected to mirror server 40, and 20% of the requests are 
1=0 redirected to mirror server 42. The result is an overall minimized user response time, 
y It should be understood that the solution does take into account geographic or 

Q regional information, which manifests itself in the computed response times. Thus, 
^ embodiments of the present invention are more flexible than load-based approaches because it 
M does take into account geographic regions, and it is more flexible than location-based 
J|> approaches because it is possible to redirect end-user requests across regional boundaries to 
jl different mirror servers, if such a mapping will reduce overall response times. 
O A more formal explanation of the assignment process will now be provided. In 

more precise terms, if a content delivery system has: 

• m servers, S' = {S l9 ...,S m }, 

20 • n active customers, C" ={C 1 ,...,CJ, and 

• 8 geographic locations, G' ={G l9 ...,G g }, 

then it is a goal of embodiments of the present invention to generate a mapping (i.e. an 
assignment): 

a : rxG'xS^O, 1], 
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such that, if a i j k = ~^ , then ju% of the requests to the site of customer C t that are coming 

from geographic location G ; are assigned to server S k . For example, if a content delivery 
server has a total of three servers, S l9 S 2J and S 39 then given a customer C } in geographic 
region G„ the mapping may be generated as follows: 
5 a ul = 0.50, 

a u,2 = 0.30, and 

a Us3 = 0.20, 

which indicates that 50% of the requests for customer C, in geographic region 
CJ G } will be redirected to server S l9 30% of the requests for customer C } in geographic region G } 
30 will be redirected to server S 29 and 20% of the requests for customer C } in geographic region 
! f Gj will be redirected to server S 3 . 

s To produce such a mapping, embodiments of the present invention must have 

p knowledge about various aspects of the dynamic-content delivery network including, but not 
T: limited to: 

5 • the network delay ndel J k required for the delivery of typical 

- dynamic content (~4kB size) from server S k to geographic 

location G y ; 

• the server delay sdel j k required for the servicing of a typical 
dynamic content (~4kB size) request from server S k (note that 

20 this delay may increase as the load of the server increases, so 

sdel jk can be a function of the server load); 

• the load capacity s k of server S k beyond which the server 
becomes too overloaded to be useful; and 

• the load requirement of customer C t generated by the end- 
25 users located at location Gj. 

If the entire load requirement of customer Q is denoted as q, the portion of this 
generated by end-users located at G ] is equal to u l} x c r In other words, 
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u ltJ = the fraction of requests coming to customer C t from 
geographic region G p and thus 

u Uj x c t = all requests coming to customer Q from geographic 
region G r 

5 Note that given these aspects of a content delivery system according to 

embodiments of the present invention, certain constraints exist: 

• the content delivery system shall not assign more load to the 
servers than they can handle: In other words, for each S k e 
S', 

n g 

X Z a m xii i.j xc i ^ 

fj • Every end-user request shall be assigned to some server: For 

each QeC, G g eG', 

W • An end-user request shall not be assigned to a server which 

gj5 does not have the required content. Therefore, if there are 

many requests coming to a customer, yet there are no suitable 
servers for that customer, then embodiments of the present 
invention may need to migrate corresponding data to a 
suitable server. 

20 Note also that, by definition, the a lJ k values of the mapping cannot be negative. 

That is, for each QeC, G g eG\ S k eS': 

Therefore, while choosing an assignment with nxgxm variables (a iJk ), embodiments of the 
present invention must consider m+nxgxm+nxg linear constraints. 
25 According to embodiments of the present invention, the large number of 

constraints as compared to the number of variables to consider in computing a solution results 
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in multiple solutions that yield the same result. The complication caused by these multiple 
solutions is that when a particular solution is recomputed, a new solution may be chosen which 
significantly changes all the previous mappings. New mappings, while not catastrophic, may 
lead to increased delays as new connections must be made across geographic boundaries, and 
5 the system performance may not be stable. Thus, in alternative embodiments, another 

constraint on the solution is that, given more than one possible optimum solution, a solution 
shall be selected that minimizes the differences between the previous mapping and the new 
mapping. 

O Under certain conditions, the various constraints that must be considered in 

JjO order to generate a solution can be specified as linear constraints, which are constraints that 
O can be specified by variables that may only be multiplied by constants, not multiplied by each 
|^ other or raised to a power. Because the constraints can be specified as linear constraints, in 
12 alternative embodiments of the present invention, a linear constraint solver may be used, a 

complex computation well-understood by those skilled in the art. However, solving linear 
CE> constraints requires a lot of time. 

In further alternative embodiments of the present invention, the various 
£ constraints may be specified in terms of nonlinear constraints, and a nonlinear constraint solver 

can be used to produce a solution. However, solving nonlinear constraints is generally much 

more inefficient than solving linear constraints, and therefore is generally slower. As will be 
20 described later, when a heuristic algorithm is used according to preferred embodiments of the 

present invention, the constraints may be specified as nonlinear constraints, and yet a solution 

can be generated in a shorter amount of time. 
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Performance Tuning for Dynamic-Content Delivery Networks 
Response Time 

As discussed above, each mirror server has an associated expected server delay 
time, which is usually a function of its load. FIG. 4 illustrates a graph of an example mirror 
5 server delay time characteristic. The mirror server S t provides a relatively constant response 
time up to some load s n beyond which the server delay time jumps suddenly (see reference 
character 50). Therefore, as long as the load of the mirror server is kept below s h the server 
delay can be estimated as a constant or specified as a linear constraint, either the typical delay 
S of the server (sdelj or its worst case delay time (sdel 2 ). 

Op In preferred embodiments of the present invention, to increase the accuracy of 

ill the estimated server delay, the server load constraint previously described may be adjusted to, 
I y for example, 80% of the maximum server load capacity in order to remain below this s k 
s threshold and remain in the linear region. However, once a solution is computed with that 
p constraint and that solution is in place for the next period of time, it is possible for the actual 
f^5 server load to fluctuate above or below s k . 

y The average prioritized observed response time (AORT) of the dynamic-content 

delivery network can be defined as follows: 

n „ g „ m 

A- 

AORT = 



Z f " j Z ■ g = x Z k Z j a UJc x u u x c i x Pi x ( sdel k + nde h,k ) 



y c, x 



where sdel k is the server delay, ndel ]k is the network delay observed by customers in 
20 geographic location G } while accessing server S k , and Pj is the priority of the customer (based 
on its service agreements). For purposes of simplifying the discussion, the examples presented 
herein will assume that all customers have the same priority; i.e., for all QeC', Pi=l. 
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However, it should be noted that embodiments of the present invention can take into account 
differing priority values p ( . 

In addition to AORT, individual observed response times (AORl^s) can be 
defined for the clients (end-users) of individual customers: 

g m 

AORT- = 



= l^fc = X Ui > j X Ci X ^ sMk + ndel & ) 



Note that these definitions of response time are linear. Thus, they can be 
minimized/solved using a linear optimization technique, such as simplex. However, if the 
server delay cannot be treated as a constant, then the definition is not linear and a non-linear 
optimization technique (which is generally much more expensive) will need to be utilized. 
jjjP For purposes of understanding the equation, the components of AORT i will now 

H be described. As noted above, sdel k is the server delay of server K, and ndel Jk is the network 

delay between geographic location J and server K. Thus, sdel k + ndel jk represents the total 
H delay seen by requests coming from geographic region J to server K. Furthermore, c x is the 
N= total load of customer I, and u u is the fraction of requests coming to customer I from region J. 
rife Thus, u ltJ X c t equals the number of requests coming to customer I from region J. In addition, 
a ijk represents the fraction of requests coming to customer I from region J that have been 
redirected to server K. Thus, a iJk X u ltj X q equals all requests coming to customer I from 
region J that have been assigned to server K. When this result is multiplied by sdel k + ndel J k , 
the result is the total delay of all requests coming to customer I from region J that have been 
20 assigned to server K. This delay is then summed up over all servers and all geographic 

locations. Thus, the numerator of AORT; represents the total delay for all requests coming to 
customer I from all regions that have been assigned to all servers. When the numerator of 
^407?^ is divided by c i9 which is the total number of requests to customer z, the average 
response time for all of those requests can be computed. 
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In alternative embodiments of the present invention, response times can be 
minimized by minimizing both AORT and AORT t using a linear constraint solver. However, 
the time it takes to produce a solution using a linear constraint solver may be prohibitive. For 
example, in embodiments of the present invention, the system may recompute a new solution 
5 after a certain period of time, such as, for example, every 30 seconds. Once a present solution 
is computed, requests occurring within the next 30 second period would then follow the 
previously computed solution. However, it may take up to one hour or more to compute a 
solution using a linear constraint solver. Again, it should be understood that solutions are not 
^ computed based on individual requests. Instead, the overall response time of end-user requests 
■ft) is monitored, and if the overall response time begins to rise, the system may compute a new 
p solution which redirects some of those requests to another location. 

Server Load Balancing 

I In addition to the linear constraints previously described, which produce the 

iZ lowest overall response times, in preferred embodiments of the present invention additional 
UJ5 linear constraints may be imposed to balance the load of the mirror servers. While load 
q balancing is not necessarily required to produce the lowest overall response times, it can be a 
" factor in certain situations. For example, referring again to FIG. 4, a solution for minimizing 
response times may result in all mirror servers having a server load of less than S k , but without 
load balancing the server load for individual mirror servers may not be balanced. In other 
20 words, one mirror server may have a server load that is very close to S k9 while another mirror 
server may have a server load far away from S k . This can lead to problems, because a mirror 
server that has a server load close to S k may encounter load fluctuations that push its server 
load above S k9 resulting in excessive response times. By adding a load balancing constraint to 
the problem to be solved, all of the mirror server loads will be as far away from S k as possible, 
25 and load fluctuations will be less likely to produce excessive response times. 
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A more formal description of the load balancing requirement will now be 
provided. Given 0, which describes the maximum allowed deviation from balanced mirror 
servers, then for all pairs of servers, S k and S„ the following constraint must be satisfied: 

V " V 8 

J ~ <(l + 0)x^-. 

V n V 8 *i 

2, i=1 L J=l a ijj xu u xc i 

5 The appropriate value of 0 can either be application-dependent or can be 

a searched using a binary search until a feasible set of equations are discovered. Note that this 

adds mx(m-l) more inequalities to the system. 
3 For purposes of understanding the above equation, its components will now be 

y, described. It should be understood that the equation represents the load balance between two 
:|0 mirror servers K and L. Once again, c, is the total load of customer I, u U] is the percent of 
s requests coming to customer I from region J, and a lJ k represents the fraction of requests 
o coming to customer I from region J that have been redirected to server K. Thus, a,, t xu ti x c, 
^ equals all requests coming to customer I from region J that have been assigned to server K. In 
M the numerator, this value is summed for all regions and all customers to generate a total load 
15 for mirror server K. The same thing is done in the denominator to come up with a total load 
for mirror server L. The left side of the equation is therefore a ratio of the loads of mirror 
server K and mirror server L. That ratio must be less than a certain threshold represented on 
the right side of the equation by (1 + 0), where the character 0 is a fractional value 
representing how much of an imbalance will be tolerated. 
20 In addition, it should be noted that the right side of the equation also includes a 

ratio s k /s„ where s k represents the load capacity of server K and s t represents the load capacity 
of server L. The reason for having this ratio in the equation is that not all mirror servers have 
the same load capabilities, and it does not make sense to balance the load of two mirror servers 
when their load capabilities differ. Thus, for example, if server K has much greater server 
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load capability than server L, a balanced pair of mirror servers would have each mirror server 
operating at, say, 60% of maximum even though mirror server K may be processing many 
more end-user requests than server L. 

Alternatively, if the previously described response time constraints are too 
5 expensive to compute or the network delays are insignificant, the load imbalance may be 
explicitly minimized. In that case, instead of minimizing the AORT function, we would 
minimize an imbalance function (IMBL) as described below: 

m m 

IMBL = £ X {loadiS^-loadiS^, 

^ k=\ i=k+i 

\i where 

3P Ioad(S k ) = 2] I a u.k x u u x c i 

m /=1 1=1 

1 is the load of the server, S k . 

^3 Maintaining Stability of the Dynamic-Content Delivery Network 

As discussed earlier, the system parameters (such as the customer load 
O requirements) that affect a content delivery system can change often, and thus such systems 
15 must adapt to changes quickly. In addition, when a new solution is computed, preferred 
embodiments of the present invention compute a solution that is as close to possible as the 
previous solution. Furthermore, the adaptation should cause minimal (or no) disruption to 
service provided to existing customers. For instance: 

• an increase in the resources due to the addition of a new 
20 server or the service termination of a customer should not 

cause a load redistribution, unless it violates the load-balance 
constraint 0, 

• a new customer should be accepted into the system only if the 
average observed response time for the existing customers 
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does not increase too much. That is, given the old and new 
values of the AORT function, 

2^_ 1 Z,._ 1 2, t _ 1 a w x u u x c i x (sdel k + ndel jk ) 
AORT old = 1 = 1 7 = 1 k ~ 1 , and 



En 



v n v £ v m 

2, , , 2- , _ , 2. t _ -, a 'u* xu u x c i x ( sde h + ndel ]jc ) 

^OJ?r . = * -i J ~ l K ~ [ 



^1=1 



the system should ensure that 

AORT ttew <(l + Q>)xAORT old , 

where a' iJ k is the new assignment and O is the allowable change 

in the average response time for the existing clients. Note that 
although this guarantees that, on the average, customers will not 
observe a big change in response time, it does bound the change 
observed by individual customers. Hence, customer-specific 
response time change constraints may also be used as well: 

AORT Une <{\+^xAORT it0ld 

where O t is the allowable fractional change in the average 
response time of the customer C r Note that is a value that is 
initially selected, such as 0.05 (5%). However, if there is no 
solution that can satisfy the 5% change constraint, preferred 
embodiments of the present invention will automatically increase 
the value of <t>, by a certain amount until a solution is found. 

• changes in the request rates (or loads) of existing customers 
should not affect the average observed response time for 
existing customers. This constraint can be defined similar to 
the constraints of the previous item. 
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Techniques for Calculating the Assignment Mapping 

As discussed earlier, the above constraints can generally be described in the 
form of linear constraints. Hence, in alternative embodiments of the present invention a 
general purpose linear optimization tool, such as Maple software (which includes simplex), can 
5 be used to solve the constraints optimally. If the mirror server delay is not constant, however, 
as illustrated in FIG. 4, then the constraints will be non-linear, and a non-linear constraint 
optimization technique will have to be used. However, because the number of variables and 
constraints grows rapidly as the size of the problem increases, both of these options can be 
n very expensive. In addition, these options do not scale up well. 

:ft Thus, in preferred embodiments of the present invention an efficient heuristic 

O algorithm is implemented that produces close-to-optimal results with a short execution time. In 
preferred embodiments, the heuristic approach is capable of solving the previously described 
constraints AORT and AORT n and the previously described load balancing constraints and the 

s increase in response time constraints. 

o 

0j5 Intuition 

□ The intuition behind the heuristic algorithm according to preferred embodiments 

of the present invention will now be described. Considering again the average response time 
AORT, 

£ i 1 1 ^ * § = l ^ k = \ l ^ k X UiJ X Ci X ^ Sddk + nddhk ^ 

20 it can be seen that one way to prevent AORT from growing too fast is to limit 

the summation values in the numerator by assigning large a l J k values to small u itJ x q x 
(sdel k +ndel j k ) values. Such an assignment will cause the individual terms in the summation, 
and thus the overall AORT, to be small in general. For example, assume for purposes of 
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illustration only that there is an i,j,k triple 2,2,3 and that the u l>} xqx (sdel k +ndel Jk ) value for 
the ij,k triple 2,2,3 produces a value of 50. Also assume that there is another i,j,k triple 2,1,4 
that produces a u l} xqx (sdel k +ndel Jk ) value of 24. In addition, assume that a value of 0.2 or 
0.4 can be assigned to a Ujtk . By assigning the larger a lJk value (0.4) to the smaller u Uj xc,x 
5 (sdel k +ndel jk ) value (24), a smaller overall response time will be produced. 

Iterative Merge-based Technique 

The heuristic algorithm according to embodiments of the present invention is 

iterative. At each iteration, at least four sorted lists are created: 

^ • C x values are sorted in increasing order of c i (i.e., customers 

jj3 are sorted in order of increasing server load), 

:TJ • {C i9 Gj) pairs are sorted in increasing order of u X] (i.e. 

M« customer and region pairs are sorted in order of increasing 

m fractional amounts of requests coming to the customer i from 

region;'), 

i|5 • S k values are sorted in increasing order of sdel k (i.e. servers 

^ are sorted in order of increasing server delay), and 

!l • (G Jf S k ) pairs are sorted in increasing order of ndel jk (i.e. 

p{ server and region pairs are sorted in order of increasing 

network delay). Note that if ndel Jk is not constant but rather a 
20 function of the mirror server load, this term needs to be 

adjusted at each iteration. Thus, the heuristic algorithm 

according to preferred embodiments extends to non-linear 

cases of the problem as well. 

In addition, in preferred embodiments, in order to promote load balancing, 

25 • C t values are also sorted in decreasing order of remaining load 

capacity s k (i.e. servers are sorted in decreasing order of 
remaining load capacity). 

Note that all of these sorted values were measured at a certain point in time, 
prior to the recomputation of the present solution. 
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Once the sorted lists are generated, the heuristic algorithm according to 
embodiments of the present invention performs a sequence of iterative steps. At each iteration, 
the top-most (C v G jf S k ) triple with the smallest u tJ xc ; x (sdel k +ndel Jk ) value is selected through 
a merging operation. The server S k of the selected (C s , G J} S k ) triple is then assigned the 
5 remaining load from the <QG ; > pair. If the load capacity of the server S k is not sufficient to 
handle the remaining load, then the remaining capacity of the server S k is used for the (QG ; ) 
pair, and the unassigned portion of the pair is reinserted to the iterative process. 

An example environment including three customers, three regions, and three 
q servers will now be presented for purposes of explaining this iterative merge operation. 
SjO Assume that this example environment results in the sorted lists c t , u l>p sdel k , ndel Jk , and s k as 
3 illustrated in FIG. 5. First, as discussed above, the smallest u U] xc,x (sdel k +ndel jk ) value is 
U selected by taking the top item in each of the leftmost four lists, and finding the smallest 
]2 comparable item in each of the remaining leftmost four lists. 

: Thus, the top item in the first list may be selected, which happens to be c 2 in 

IBS this example. The selection of c 2 means that comparable items in each of the remaining lists 

5 i S 

|I u ip sdel k , and ndel J k , must have i=2. In preferred embodiments of the present invention, 

^ when the top item in the c t list is selected first, the process of finding comparable items moves 
from list to list in the order c t -> u Uj ndel Jk -» sdel k . Therefore, in list u ip the comparable 
item is the highest u tJ pair with i=2, or u 23 . The selection of u 23 adds an additional restriction 

20 in that comparable items in each of the remaining lists sdel k and ndel j k must have i=2 and j =3. 
In list ndel J fc , the comparable item is the highest ndel Jk pair with j =3, or ndel 3 3 . The selection 
of ndel 3 3 adds an additional restriction in that the comparable item in the remaining list sdel k 
must have k=3, or sdel 3 . From these four comparable items c 29 u 23 , ndel 33 , and sdel 3 (see 
reference character 52) a value for u u x qx (sdel k +ndel j } ) using the triple (2,3,3) can be 

25 computed. 

This same process is repeated for the top item in the second list (u 3t2 ). In 

preferred embodiments of the present invention, when the top item in the u l} list is selected 
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first, the process of finding comparable items moves from list to list in the order u Uj -> ndel j k 
-» sdel k -> q, resulting in comparable items c 3 , u 3j9 ndel }}9 and sdel l9 (see reference character 
54). From these comparable items a value for u uj x q x (sdel k +ndel jk ) using the triple (3, 1,1) 
can be computed. It should be noted that in alternative embodiments, the process of finding 
5 comparable items may move from list to list in the order u Uj -» ndel J k -» c, -> sdel k9 or in the 
order u l } -> c, -> -» 

This same process is then repeated for the top item in the third list {ndel u ). In 
preferred embodiments of the present invention, when the top item in the ndel ] k list is selected 
first, the process of finding comparable items moves from list to list in the order ndel jk -> sdel k 
Jo -> u Uj -» q 9 resulting in comparable items c 39 u 3j9 ndel l l9 and sdel } (see reference character 
ri 58). Note that the comparable items happen to be the same as the second group of comparable 
ff items. From these comparable items a value for u h} xqx (sdel k +ndel Jk ) using the triple {3,1 ,1) 
^ can be computed. It should be noted that in alternative embodiments, the process of finding 
3 comparable items may move from list to list in the order ndel J k -> u tJ -> sdel k -> c l9 or ndel jk 
f|5 -> u K} c i -> sdel k . 

rf Finally, this same process is repeated for the top item in the fourth list (sdel 3 ). 

In preferred embodiments of the present invention, when the top item in the sdel k list is 

selected first, the process of finding comparable items moves from list to list in the order sdel k 
ndel ] k -» u h] c l9 resulting in comparable items c 2 , u 2>39 ndel 33 , and sdel 3 , (see reference 
20 character 56). Note that the comparable items happen to be the same as the first group of 

comparable items. From these comparable items a value for u l3 xc t x (sdel k +ndel J k ) using the 

triple (2,3,3) can be computed. 

Next, the triple having the smallest merged delay value is identified. 

Continuing the above example, if the u t j x q x (sdel k +ndel J lc ) merged delay values for the 
25 triples (2,3,3) and (3,1,1) are 10 units and 5 units, respectively, then the triple (3,1,1) is 

selected. As a result, server S 1 is assigned the load coming to customer C 3 from region G l9 and 
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a particular a ljk value is assigned to the triple (3 J J). The result of assigning an a lJk value to a 
particular merged delay value is that an assignment of a particular customer to a particular 
server in a particular region has been made. 

If server 5 7 has sufficient load capacity Sj to handle all of the assigned load, then 
5 the a Ujk value is maximized (=1.0) and AORT l is minimized, and the goal of assigning large 
a i]k values to small u u x qx (sdel k +ndel Jk ) values is satisfied. The q and u itJ values are then 
recomputed taking into account this assignment. For example, because all of the load coming 
to customer C 3 from region G } has now been accounted for, the value of u 3l is now zero, and 
thus u 3I is removed from the u tJ list. Furthermore, because some of the load for customer C 3 
10 has now been accounted for, the value of c 3 is reduced. It should be noted that despite the 
initial assignment, sdel k and ndel ] k do not change, because they are assumed to be constant. 

If server S } does not have sufficient load capacity s 2 to handle all of the assigned 
S load, then the a iJk value is computed as a value between zero and one, and a particular AORT t 
nj value is computed. The remaining capacity of the server S } is then assigned to handle some of 
j : J5 the load coming to customer C 3 from region G } . As a result, the value of s 2 is now zero, and 

thus s } is removed from the s k list. Furthermore, because some of the load coming to customer 
H C 3 from region G } has now been accounted for, the value of u 3 l and c 3 is reduced, 
jy, Once these assignments have been made and the values for q, u ip and s k are 

j^j recomputed, the lists for q t u Up and s k are re-sorted. The process of finding comparable items 

130 for the top item in each list is repeated, the (C lf G„S k ) triple with the smallest value is 

O 

p identified, server loads are assigned, and a m and AORT l values are computed. This iterative 
process is repeated until all loads coming to all customers C t from all regions G ; have been 
assigned to a server S k9 where possible. 

It should be noted that the example previously discussed did not take into 

25 account the s k list. However, in preferred embodiments of the present invention, load 

balancing can be taken into account by including the sorted s k list in the process of finding 
comparable items in the lists. In preferred embodiments, the process of finding comparable 
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items by proceeding through any of the list-to-list orders discussed above will include the list s k 
at the end of the order. 

Data Migration 

The previous discussion assumed that all mirror servers contained a copy of the 
5 content requested by end-users. Suppose, however, that an assignment is made where end- 
users are assigned to fetch content from a particular mirror server that does not have the 
requested content. Generally, that assignment should not have been made. However, if the 
assignment would overall result in lower overall response times, the assignment may 
nevertheless be worthwhile, if the requested content can be copied to that mirror server. The 
JO copying or migration of data represents a penalty, and thus assignments that require data 

'cot 

y migration should be made only where necessary. To minimize the number of assignments to 

[J servers without the requested data, in preferred embodiments of the present invention another 

^ list can be created, comprising: 

U • (Q,S k ) pairs, unsorted (i.e., customer and server mappings in 

fli5 which the content of customer / is stored in server k). 

^ The list of (C if S k ) pairs is unsorted, and merely represents all (C if S k ) pairs in 

Q which the content of customer / is stored in server k. Once a (C t , Gj,S k ) triple representing 
minimal merged delay is identified as described above, this list is consulted to ensure that the 
assigned triple does not result in customer loads being assigned to mirror servers that do not 

20 contain the requested content. 

If the iterative process fails to find a solution (that is, if no suitable server can 
be found), a suitable candidate ((C lf G jf S k ) triple) with the smallest u l} ^c^{sdel k +ndel^ is 
chosen and the data of customer C t is migrated to server S k . In preferred embodiments, the 
time penalty associated with migrating data to a server S k may be quantified. Thus, when 

25 choosing a suitable candidate «C„ G J} S k ) triple) with the smallest u^qxisde^+ndel^), the 
server S k having the smallest penalty may be selected. 
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Response Time Estimation 

As the preceding equations illustrate, the heuristic algorithms according to 
embodiments of the present invention require that the server delay sdel k and network delay 
ndel ] k observed by end-users accessing a particular mirror server and located at a given 
5 geographic location be quantified. Server delay, which is the time it takes for a mirror server 
to process an end-user request, can be directly determined from the mirror server itself. 
Because the mirror servers are part of the content delivery system, measuring server delay is 
relatively straightforward. 
"5 Network delay is more difficult to measure. Because content delivery systems 

Tb do not have access to machines in every location of importance, end-user response times 
Ly generally cannot be directly measured. However, network delay can be estimated using server 
fli logs that are maintained by the mirror servers. It should be understood that when a remote 
p " end-user requests content from a particular mirror server, a sequence of messages are passed 
N 8 back and forth between the user and the server, and the timing of these messages are stored in 
|I|5 server logs. The timing of these messages is used to estimate the overall response time. 

J5 Available Logs 

There are two types of server logs available from a mirror server. One type is 
called TCP logs and the other is called HTTP logs. Each type of log yields a different set of 
information. 

20 Using TCP Logs 

TCP logs can be used to estimate the characteristics of the connection between 
the mirror server and the client/proxy/end-user that is immediately upstream over the 
connection, including the immediate round trip delay. For purposes of illustration only, 
assume that a system includes a mirror server and an end-user, and that the end-user's Web 
25 browser has opened up a connection with the mirror server. Because TCP logs store two 
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variables, including the connection establishment time, the immediate round-trip delay and the 
per-byte delay can be extracted from the TCP logs. 

In embodiments of the present invention, the immediate round-trip delay can be 

defined as 

^ ^conjsi ~~ ^ackjend ^^imm^ 

where (1) t con _ est is the time at which the mirror server receives a connection establishment 
message (the time at which a connection is established), (2) t ach send is the time at which the 
mirror server sends an acknowledgement message in reply to a connection request message 
_ received from a client, and (3) Ar imm is the round trip delay between the server and the entity 
CO that is immediately upstream. Because both t con _ est and t ackjend are stored in the TCP logs, Ar imm 

p can be readily computed. 

i = i 

TT Furthermore, in embodiments of the present invention the equation 

12, tack receive(k) ~~ t <Majend$) ~ ^ r imm"^ ®" x (^"0 

s can be used to estimate the connection parameters Ar mm and a constant a describing per-byte 

ffi) delay, which is the delay associated with each byte in a particular message. Here, t dQta send (l) is 

£? the time at which f h byte of data has been sent, and t ack receive (k) is the time at which 

y acknowledgement for the ^ h (k>l) byte is received. 

iL.„il 

Because TCP logs store the time at which the I th byte of data has been sent and 

the time at which acknowledgment for the k? h has been received, these stored values can be 

20 plugged into the above equation, along with Ar mm in order to determine the per byte delay a. 

Note that due to the TCP flow control mechanism, in general, the a value is not fixed during 

the communication. Similarly, Ar imm may change during the lifetime of the connection. 

However, because there are many acknowledgements sent during the course of a single 

connection, it is possible to gather statistically significant values, which describe the general 

25 behavior of the connection over time. 

Continuing the present example for purposes of illustration only, assume, as 

illustrated in FIG. 6, that the TCP logs indicate that the 100 th byte of data was sent at time 5, 
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the 200 th byte was sent at time 7, the 300 th byte of data was sent at time 8, and the 400 th byte of 
data was sent at time 9. In addition, assume that the TCP logs indicate that an 
acknowledgment for 100 th byte was received at time 10, the acknowledgment for the 200 th byte 
was received at time 15, the acknowledgment for the 300 th byte was received at time 30, and 
5 the acknowledgment for the 400 th byte was received at time 31. 

By looking at any two pairs of / and k, an estimated per-byte delay can be 
determined. Continuing the present example, if two entries are used from the TCP logs, one 
entry being the fact that the 100 th byte of data was sent at time 5 (see reference character 60) 
^ and the other entry being the fact that acknowledgment for the 200th byte was received at time 
flp 15 (see reference character 62), a per-byte delay can be computed. Note, however, that the 
v3 data send time and the acknowledgment received time for the same 100 bytes of data (l=k) 
'i2 cannot be compared, because such a comparison would only yield information on the network 
; y delay. To determine how the size of the message impacts the delay, two different byte 
s transmissions must be compared. 

In the present example, / = 100, k = 200, (k-l) = 100, t ack _ receive (k) = 15, and 
£1 hatajenJl) =5 - Because Ar imm is known, the value for a can be determined from the previously 
g defined equation t^^Jk) - t damjend (l) = Ar mm +a x (k-l). Different values for a can also be 
computed for various pairs of / and k. In preferred embodiments, the a values can be 
averaged, or the last computed a value could be selected as the per-byte delay. 
20 FIG. 7 illustrates a graphical representation of the previously described example 

computation according to embodiments of the present invention. As represented by t ack receive (k) 
- t data send (l) (see reference character 64), the comparison of the elapsed time between sending 
the f 1 byte of data and receiving acknowledgment of the tf h byte of data includes roundtrip 
delay information Ar imm plus (k-l) units of byte delay. The time labeled by reference character 
25 66 is equivalent to a x (k-l), while Ar imm is represented in two time periods identified by 
reference characters 68 and 70. 
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It should be understood that the response time estimation methods described 
above, as well as those described below, are not necessarily restricted for use in determining 
optimal server load distributions for minimizing overall response times. In alternative 
embodiments of the present invention, the methods for estimating response times described 
5 herein may be user for other purposes such as, but not limited to, quantifying network 
performance. 

Using HTTP Logs 

As indicated above, in embodiments of the present invention response times can 
also be estimated using HTTP logs. There are three types of HTTP connections: non- 
jp persistent connections, persistent connections, and persistent connections with pipelining. 
UJ When non-persistent connections are used, each time an object is requested a connection is 
nj opened, the request is transmitted, the information is received, and then the connection is 
^ closed. The connection does not persist. Thus, each time an object is transferred, a new 
H connection must be opened. For persistent connections, a connection is opened, requests for 
yys multiple objects are transmitted, the requested information is received, and finally the 
f± connection is closed. The connection is not closed until all of the objects have been received. 
^ For persistent connections with pipelining, a connection is opened, requests for an object are 
transmitted, but before that object is received, further requests for objects are being 
transmitted. Thus, the requests are pipelined within the same connection. 

20 Non-persistent Connections 

The mirror server logs for non-persistent connections may contain the following 

information: 
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^conjeqjec 


connection request time (TCP) 


^con_est_send 


time at which connection establishment message is sent (TCP) 


^con_est_rec 


time at which connection establishment message is received (TCP) 


^reqjec 


request retrieval time 


^resp_sendj?eg 


time at which the server starts sending the response 


tresp_send_end 


time at which the server stops sending the response 



It should be understood that because there may be a proxy server between the 
client (end-user) and the mirror server, the equation 

; hJ ^con_est_rec ~~ ^ congest jend 

05 cannot be used to estimate the server-client round trip delay. If a proxy server exists between 

hi 

71 the end-user and the mirror server, the end-user makes a request to the proxy server, and then 
the proxy server makes a request to the mirror server. Once the requested information has 
been retrieved by the mirror server, the mirror server sends that information back to the proxy 
□ server, and then the proxy server sends the information back to the end-user. It is therefore 
3p not possible to estimate the round trip delay from the log of a single non-persistent connection 
3 because t req rec - t conjstjend is almost zero and independent of the transmission delay, and because 
the connection does not have an explicit connection close message. 

However, in embodiments of the present invention the time between two non- 
persistent connections can be used to estimate the round trip delay. Assuming that an end- 
15 user's Web browser requests two consecutive objects, o { and o l+1 , in a single page without a 
delay between them, then 

^con_req_rec^J , ~^~ -0 — ^resp_send_encff) server, client* 

This equation represents the time between when a mirror server sends the object 
for the /th request back, t resp _ send _ end (i), and when the mirror server receives a connection request 
20 for the next (i + 1)* object. This essentially is the time from the end of a first request to the 
time of the beginning of a second request. 
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It should be noted that even if there may be a proxy server between the end-user 
and the mirror server, as long as the client is using non-persistent connections, the above 
equation will be a good estimate of the round trip delay. Furthermore, even if the client is 
using a (non-pipelined) persistent connection, but a proxy server is splitting the connection into 
5 multiple non-persistent connections, the average Ar server>diem will give a reasonable estimate of 
the round trip delay between the client and the server. However, if the client (or the 
intermediate proxy server) is using simultaneous non-persistent connections, Ar servertClient cannot 
be estimated using the delay between two consecutive connections. 



q Persistent Connections 

lp The server logs for persistent, non-pipelined, connections may contain the 

-~ j, 

7*! following information: 





^con_req_rec 


connection request time(TCP) 




^con_est_send 


time at which connection establishment message is sent(TCP) 




I con_est_rec 


time at which connection establishment message is received(TCP) 




^reqjecQ) 


f request retrieval time 




^resp_send_beg 0) 


time at which the server starts sending the response for f h request 


•SSKS" 


^respjsendjend (}) 


time at which the server stops sending the response for f request 




^con_close_rec 


time at which the server receives a request to close the connection 



In embodiments of the present invention, the following equation can be used to 
estimate the round trip delay between the client and the server: 

tcon_close_rec ~ tresp_send_end(^t) — Ar serverc i ient , 

or more generally, the following equation can be used: 

^req_rec() -0 — tresp_send_end(j) ^*server,clienr 

Estimation of round trip delay for persistent connections is very similar to the 
estimations of round trip delay for non-persistent connections, but are generally more accurate. 
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For non-persistent connections , an assumption was necessary that the same client browser 
requested two consecutive objects in a single page without delays between them. With 
persistent connections, it is known that all of the requests within that particular connection are 
from the same client browser for consecutive objects in a single page and that there are no 
5 other delays between them. 

Persistent Connections with Pipelining 

The server logs for persistent pipelined connections are similar to the logs of 
non-pipelined persistent connections. Therefore, in embodiments of the present invention, the 

■ SMB. 

round trip delay between the client and the server can still be estimated as 

JP ^con_close_rec ~ ^resp_send_enJJ^^) server t chenv 

|W With persistent pipelined connections, the requests are generally overlapped and 

;fy the only time information available is the last request, t resp send ef Jlast). Using the last request, 
I the calculation is then performed in a manner similar to persistent connections, described 
f~ above. It should be noted that not all Web browsers send connection close messages, and 
BU5 therefore the above equation may not always be available to estimate round trip delay time 
p when persistent pipelined connections are used. It should further be noted that even though the 
^ last request is used in persistent pipelined connections, all that is required is a single sample of 

the round trip delay time. Thus, even though only one data point is available in this situation, 

it is sufficient. 

20 Estimating the Response Time Observed by the Client during a single HTTP Connection 

The preceding sections described the computation of an estimated round trip 
delay time according to embodiments of the present invention. However, to use the heuristic 
algorithm previously described, the overall response time observed by the client must be 
estimated. This overall response time includes the round trip delay time and the server delay 
25 required for a mirror server to process a request. Using the information collected from the 
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HTTP logs, the response time observed by the client software can be estimated using different 
techniques, depending on the nature of the connection. 

For non-persistent connections, in embodiments of the present invention the 
response time of the server can be estimated as 

✓ , \ ry ^'server, client 

\ resp send end con req rec) * 



resp _ send _ end con _ req _ rec > 

The first term above represents the time that server has been involved in the 
process (server delay). The latter term represents the round trip delay, the two network delays 
^ observed by the client (before the server receives the connection request, and after the server 
'0 sends the last byte). 

lip For persistent connections without pipelining, in embodiments of the present 

i2 invention the response time of the server can be estimated as 

L,^ f \ ■ ^'server, client ^ server , client _ . f 

V con _ close _ rec con _req _rec ) ^ 2 con - c ^ ose - rec con - re 9 - rec * 

^ The first term above (in parentheses) gives the time that server has been 

W involved in the process (server delay). Note that in order to find the response time observed 
[J5 by the end-user, the network delay during the connection establishment must be added, and the 
network delay during the connection close must be subtracted. 

FIG. 8 illustrates a graphical representation of the above equation according to 
embodiments of the present invention. If the server delay (t con dosejec - t conjeq _ re( ) (see reference 
character 72) is added to the network delay during connection establishment Ar server client /2 (see 
20 reference character 74), and the network delay during connection close is subtracted 

Ar server clie J2 (see reference character 76), the overall response time seen by the end-user (see 
reference character 78) can be estimated. Note that in the above equation, the delay 
represented by reference characters 74 and 76 may not, in fact, cancel, but in alternative 
embodiments, because the computed overall response time is only an estimation, it can be 
25 assumed that they do cancel. 
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For persistent connections with pipelining, according to embodiments of the 
present invention the overall response time can be estimated in a manner similar to the case of 
persistent connections without pipelining, as 

^con_closejec ^conjeqjec 

5 Estimating the Response Time Observed by a Client During a Page Retrieval 

Retrieval of a page consists of the retrieval of an HTML document followed by 
the set of objects within that document. This process may be performed in single or multiple 
HTTP connections. Because the content of an HTML document is known, it is also known 
y§ which objects a browser may request. Thus, using the server logs, the time at which all 
-lb objects are sent by the user can be determined. By using the server logs and the round trip 
yy delay estimations as described above, the total response time observed by the client can be 
determined. 

I Note, however, that if a request for all objects within the page does not arrive at 

% the mirror server due to the caching of some objects in proxy servers or the client itself, the 
US response time estimation is not trivial. A limit period must be established such that if a new 
o connection/object request does not arrive at the mirror server within this limit period, it can be 
™ assumed that the client already has the remaining objects within the HTML document. 

Assuming that the client will not sit idle between the object requests, the limit period can be 

established as 
20 limit =Ar serverclient . 

In embodiments of the present invention, if, after waiting limit units of time 
after the last request for an object in a given page is served, and no further requests were 
received, the overall response time estimation process may be terminated, and it can be 
assumed that the entire page has been delivered to the client. In other words, if an object is 
25 cached or is stored in a proxy server, a request for that object will never be received by the 
mirror server. The estimation process should not wait indefinitely for that request, because it 
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will never be received. Thus, if the process waits a period of time equivalent to the round trip 
delay time, and no requests for any objects are received, it can be assumes that the request for 
that object has already been served by the cache or proxy server. 

Note that if there were no objects in the cache or proxy server, it should take no 
5 longer than the round trip delay time to fetch an object from the mirror server. Thus, by 
waiting no longer than the round trip delay time for a particular object, the process essentially 
accounts for the time it would take for the mirror server to process a request for that object. 
Thus, the wait time can be included in the estimated response time observed by an end-user or 
client. 

#0 Therefore, embodiments of the present invention provide a system and method 

r\ for redirecting end-users to mirror servers in the same region as the requesting end-user, or 
'f! other regions, using assignments that minimize the overall response time seen by users of the 
V content delivery system. Embodiments of the present invention also provide a system and 

method for redirecting end-users to mirror servers using assignments that balances the loads of 
33 the mirror servers while taking into account load capability. 

fel In addition, embodiments of the present invention provide a system and method 

O for redirecting end-users to mirror servers using assignments that minimize the overall response 
time seen by users of the content delivery system, wherein an increase in resources due to the 
addition of a new mirror server or the service termination of a customer content provider will 
20 not cause a load redistribution unless load balancing constraints are violated. Furthermore, in 
embodiments of the present invention, a new customer content provider will be added only if 
the overall response time is maintained below a specified threshold, and changes to the loads or 
existing customers will not change the overall response time so significantly that it exceeds a 
specified threshold. 
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WHAT IS CLAIMED IS: 

1 1 . In a content delivery system having m servers , S ' ={S u ...,S m }, n active 

2 customers, C ={C X ,...,C^, and g geographic locations, G' ={G l ,...,G g }, whereinttfeZ* is a 

3 server delay of server S k , ndel Jk is a network delay observed by customers in geographic 

4 location G } while retrieving content from server S k , p } is a priority value for customer Q, q is a 

5 total load of customer w 0 is a fraction of requests coming to customer C ( from region G } , 
,_6 is a mapping representing a fraction of requests coming to customer C t from region Gj that 
*B7 have been redirected to server S k , and s t represents a load capacity of server S k , a method for 
p8 distributing server loads, the method comprising the steps of: 

representing an average prioritized observed response time as 

£ S • " i S -1 iZfrl x x c i x A- x ( sdel k + ndel j,k) 
no ^Oi?T = — - J - ; and 

141 generating a mapping that assigns requests from customers to a particular 

rt2 server while minimizing AORT. 

1 2. A method as recited in claim 1, further including the step of assigning all 

2 requests from all customers in all regions to a particular server such that, for each 

m 

3 C,eC',G g eG', £ a IJJt =l.O. 



1 3. A method as recited in claim 1, further including the step of assigning 

2 requests to a particular server while ensuring that the load capacity of each server is not 

n g 

3 exceeded such that, for each for each S k e S \ ]T ^ a iJ k x u u x c t < s k . 
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1 4. A method as recited in claim 1, further including the step of assigning 

2 requests to a particular server while balancing the load of each server to within a maximum 

3 allowed deviation from a balanced state 0 such that, for all pairs of servers, S k and S l9 

z_l j-i <(l + ®)x^. 



/ / a m xu i 



J xc i 



1 5. A method as recited in claim 4, wherein if the content delivery system 

32 should add one or more servers, or remove one or more customers, the load of each server will 

^3 not be redistributed unless the maximum allowed deviation from a balanced state 0 is 

fH exceeded. 

i; y 

H l 6. A method as recited in claim 1, further including the step of adding one 

if 2 or more customers to the content delivery system only if AORT new < (1 + O) x AORT old : 

W3 wherein AORT old and AORT new are old and new values of AORT defined 

□ ^-"jZ g =1 ll k ^ l a u* xu u xc i x ( sdel k +ndel j.k) 



4 zsAORT M = ' ■ J 



n 



5 and AORT ^ = 



S f " ^ j 8 = /u^u x c i x ( sdel * + ndel j,k) 



6 wherein a f i j k is a new mapping resulting from the addition of one or 

7 more customers; and 

8 wherein <P is an allowable change in AORT for existing clients. 
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1 7. A method as recited in claim 1, further including the step of using a 

2 linear constraint solver to generate the mapping. 

1 8. A method as recited in claim 1, further including the step of using a non- 

2 linear constraint solver to generate the mapping. 

1 9. A method as recited in claim 1, further including the step of using a 

2 heuristic algorithm to generate the mapping, the heuristic algorithm comprising the step of 

f1 3 assigning large a ijJc values to small u l>} xqx (sdel k +ndel j k ) values to produce a smaller overall 

J% AORT value. 
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1 10. A method as recited in claim 9, the heuristic algorithm comprising the 

2 steps of: 

3 generating a plurality of sorted lists by sorting C x values in increasing 

4 order of c h sorting <Q G) pairs in increasing order of u Up sorting S k values in increasing order 

5 of sdel k , and sorting (G J7 S k ) pairs in increasing order of ndel jk ; 

6 starting with a top-most, smallest value item in each list, identifying 

7 comparable smallest-value items from the other lists to generate a plurality of (C p G jf S k ) triples 

8 equivalent to the number of sorted lists; 

p9 selecting from the plurality of (C t , G Jf S k ) triples, the <C„ G Jf S k ) triple with 

3j0 the smallest u^xqx (sdel k +ndel Jk ) value; 

3l assigning to a server S k of the selected (Q, G jy S k ) triple a remaining load 

H;2 from the (C lf G) pair; and 

|43 repeating the heuristic algorithm starting with generating the plurality of 

JL4 sorted lists, taking into account the changes in the values of the Q values and the {C lt G-> pairs 

%5 as a result of the previous server assignment during each iteration, until the load from all 

146 {C l9 G) pairs has been assigned to a server S k ; 

f |7 wherein if, during any iteration of the heuristic algorithm, the load 

18 capacity of the server S k is not sufficient to handle the remaining load, the remaining load 

19 capacity of the server S k is assigned to some of the load of the <C„ Gj) pair, and an unassigned 

20 portion of the load from the {C u G) pair is reinserted into the iterative process. 
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1 1 1 . A method as recited in claim 10, the heuristic algorithm further including 

2 the steps of: 

3 generating a load-capacity prioritized sorted list by sorting C t values in 

4 decreasing order of remaining load capacity s k ; 

5 starting with a top-most, largest value item in the load-capacity 

6 prioritized list, identifying comparable smallest-value items from the other lists to generate a 

7 load-capacity prioritized (C„ G } , S k ) triple; 

8 considering the load-capacity prioritized <C„ Gj, S k ) triple in the selection 

9 of the top-most <C„ Gj,S k ) triple with the smallest u tJ x c, x (sdel k +ndel u ) value; and 

10 repeating the heuristic algorithm starting with generating the plurality of 

1 1 sorted lists, taking into account the changes in the values of the C, values, the <C ( , G) pairs, and 
CJ2 remaining load capacity as a result of the previous server assignment during each iteration, 
S3 until the load from all <C„ G) pairs has been assigned to a server S k . 

{J 1 12. A method as recited in claim 10, the heuristic algorithm further including 

5*2 the steps of: 

= 3 generating a list of content-available {C it S k ) pairs in which the content of 

04 customer C t is stored in server S k , and 

J*f 5 selecting the (C,, G p S k ) triple with the smallest u tJ x c t x (sdel k +ndel j k ) 

P 6 value that is also part of the list of content-available (C„ S k ) pairs; 

~ 7 wherein if, during any iteration of the heuristic algorithm, there is no 

8 <C„ Gj,S k ) triple that is also part of the list of content-available (Q,S k ) pairs, a suitable (Q, G jr S^ 

9 triple with the smallest u^c^isdel^ndel^ value is chosen and the data of customer Q is 

10 migrated to server S k . 
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1 13 . A method as recited in claim 12, the heuristic algorithm further including 

2 the step of generating a list of content-unavailable <C„ S*> pairs in increasing order of migration 

3 time penalty for which the content of customer C t is not stored in server S k ; 

4 wherein if, during any iteration of the heuristic algorithm, there is no 

5 <C„ G } , S k ) triple that is also part of the list of content-available <C„ S k ) pairs, a suitable <C„ Gj,S k ) 

6 triple with the smallest combined u x -*c x Y.(sdel k +ndel ht ) value and <C„ S k ) migration time penalty 

7 is chosen and the data of customer C ; is migrated to server S k . 

~=1 14. A method as recited in claim 1, further including the step of estimating 

^2 ndel hk for non-persistent connections using HTTP logs, the step of estimating ndel jk for non- 

W3 persistent connections using HTTP logs comprising the steps of: 

ll|4 computing an estimated round trip delay Ar ierver>cfienf as t con _ req rec {i+\) - 

H 5 t resp send e Jj) from information stored in the HTTP logs, where t conjeq rec (i + 1) represents a time 

if 6 at which a connection request message is received by the server for an (i + 1)* object, and 

0j7 tresp send enM represents a time at which the server stops sending an i ,h object; and 

J^8 computing the response time as 

9 (t „ A -t ) + 2 x server ' cliettt where t con rm rec represents a time at which a 

^ \ L resp _send _end con req _ rec * 2 conjeqjec r 

10 connection request message is received by the server for an object, and t respjmdjnd represents a 

1 1 time at which the server stops sending the object. 
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1 15 . A method as recited in claim 1 , further including the step of estimating 

2 ndel Jtk for persistent connections using HTTP logs, the step of estimating ndel jk for persistent 

3 connections using HTTP logs comprising the steps of: 

4 computing an estimated round trip delay Ar server chent as t con _ closejec - 

5 t respjmdjr Jlast), where t con closejec represents a time at which the server receives a request to 

6 close the persistent connection, and t mp sendJ Jlast) represents a time at which the server stops 

7 sending a response for a last request; and 

8 computing the response time as 

IP's j\p \f 

S 9 (t - t ) + server - cKent zaz^L where t con rea rec represents a connection 

V con _ close _ rec 1 con _req _rec J 2 2 con_req_rec r 

40 request time. 

•"y i 16. A method for estimating a per-byte network delay a observed by a 

= 2 requesting entity in a geographic location while retrieving data from a server using TCP logs, 

p3 the method comprising the steps of: 

4 computing an immediate round-trip delay Ar^ between the server and 

95 the requesting entity as t con est - from information stored in the TCP logs, where t con est is 

6 a time at which a connection with the server is established, and t ackjeni is a time at which the 

7 server sends an acknowledgement to a connection request message received from the 

8 requesting entity; and 

9 determining the per-byte network delay a using the computed immediate 

10 round-trip delay kr imm , an equation t^ ^Jk) - t data se Jl)=Ar imm +a x (k-l), and information 

1 1 stored in the TCP logs, where send (l) is a time at which an t k byte of the data has been sent 

12 by the server, and receive (k) is a time at which acknowledgement for atf h (k>t) byte of the 

13 data is received by the server. 
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1 17. A method for estimating a response time observed by a requesting entity 

2 in a geographic location while retrieving objects from a server through a non-persistent 

3 connection using HTTP logs, the method comprising the steps of: 

4 computing an estimated round trip delay Ar servertCUent as t conjeqjec (i+l) - 

5 t resp send J® from information stored in the HTTP logs, where t con _ reqjec {i + 1) represents a time 

6 at which a connection request message is received by the server from the requesting entity for 

7 an (i + l) m object, and t resp send end (i) represents a time at which the server stops sending an i th 

8 object to the requesting entity; and 

^9 computing the response time as 

ClO (t , A - 1 ) + 2 x server > chettt where t con rea rec represents a time at which a 

-~*v/ \ l resp _send _end v con _req _rec / 2 conjeqjec r 

3 1 connection request message is received by the server from the requesting entity for an object, 

42 and t resp send end represents a time at which the server stops sending the object to the requesting 

143 entity. 

h* l 18. A method for estimating a response time observed by a requesting entity 

^1 in a geographic location while retrieving objects from a server through a persistent connection 

3 using HTTP logs, the method comprising the steps of: 

4 computing an estimated round trip delay Ar servertClient as t conJlosejec - 

5 t resp se ^_ end (last), where t con _ closejec represents a time at which the server receives a request to 

6 close the persistent connection, and t respje ^ end {last) represents a time at which the server stops 

7 sending a response for a last request; and 

8 computing the response time as 

\f At* 

9 (t , -t )+ server ' client , where t con req rec represents a connection 

^ V con _ close _ rec con _req _rec J ^ 2 conjeqjec r 

10 request time. 
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1 19. In a content delivery network having m servers, S' ={S u ...,S m }, n active 

2 customers, C ={C 1 ,...,C„}, and g geographic locations, G' = {G„...,GJ, a content delivery 

3 system for distributing server loads, the content delivery system comprising: 

4 memory for storing a server delay sdel k of server S k , a network delay 

5 ndelj k observed by customers in geographic location G } while retrieving content from server S k , 

6 a priority value p } for customer Q, a total load c, of customer Q", a fraction of requests u tJ 

7 coming to customer C, from region G,, a mapping a lJik representing a fraction of requests 

8 coming to customer C t from region G } that have been redirected to server S k , and a load 
Q9 capacity of server S k ; and 

j^iO a processor programmed for 

qfl representing an average prioritized observed response time as 

- Z • ! 1 S 1 Z t _ 1 x x c '' x P/ x + n<fe/ ^ } 

f 42 AORT = l ~ l 7-1 k . ^d 

^3 generating a mapping that assigns requests from customers to a 

ILji. 

34 particular server while minimizing AORT. 

1 20. A system as recited in claim 19, the processor further programmed for 

2 assigning all requests from all customers in all regions to a particular server such that, for each 

m 

3 C x eC\G % eG', £ a iJJc =1.0. 

4=1 

1 21 . A system as recited in claim 19, the processor further programmed for 

2 assigning requests to a particular server while ensuring that the load capacity of each server is 

« g 

3 not exceeded such that, for each for each S k e S', £ Z a us* x u u x c * - s * ' 
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1 22. A system as recited in claim 19, the processor further programmed for 

2 assigning requests to a particular server while balancing the load of each server to within a 

3 maximum allowed deviation from a balanced state 0 such that, for all pairs of servers, S k and 



s 



k 



S <(1 + ©)X 

g S ! 



a UJ* u iJ* c > 



1 23 . A system as recited in claim 22, wherein if the content delivery system 

2 should add one or more servers, or remove one or more customers, the processor is further 

3 programmed for not redistributing the load of each server unless the maximum allowed 

4 deviation from a balanced state © is exceeded. 
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1 24. A system as recited in claim 19, wherein the processor is programmed 

2 for allowing one or more customers to be added to the content delivery system only if 

3 AORT new <(l + <I>)xAORT old : 

4 wherein AORT old and AORT mw are old and new values of AORT defined 



as AORT old = 



H i = l Hj = l Jl k = x a u* x u tj x c > x ( sde h + ndel u) 
v. n 



i X • " i S . g _ , X J", i a 'u* xu u x c i x ( sdel k + » ) 

■46 and AORT = Z-1 - 7 " 1 * -1 



new 



7 = 1 



iU7 wherein a!^. k is a new mapping resulting from the addition of one or 

s 8 more customers; and 

g9 wherein 0 is an allowable change in AORT 'for existing clients. 

p 1 25. A system as recited in claim 19, the processor further programmed for 

" 2 using a linear constraint solver to generate the mapping. 

1 26. A system as recited in claim 19, the processor further programmed for 

2 using a non-linear constraint solver to generate the mapping, 

1 27. A system as recited in claim 19, the processor further programmed for 

2 generating the mapping by assigning large a ltI k values to small u Uj x x (sdel k +ndel j k ) values to 

3 produce a smaller overall AORT value. 
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1 28. A system as recited in claim 27, the processor further programmed for: 

2 generating a plurality of sorted lists by sorting C t values in increasing 

3 order of c h sorting {C t , G ; > pairs in increasing order of u ij9 sorting S k values in increasing order 

4 of sdel k9 and sorting (G jf S k ) pairs in increasing order of ndel j k \ 

5 starting with a top-most, smallest value item in each list, identifying 

6 comparable smallest- value items from the other lists to generate a plurality of {C i9 G Jf S k ) triples 

7 equivalent to the number of sorted lists; 

8 selecting from the plurality of (C lf G } , S h ) triples, the (C t , G Jt S k ) triple with 
yg9 the smallest u ltJ xc t x (sdel k +ndel Jk ) value; 

JD assigning to a server S k of the selected (C t , G p S k ) triple a remaining load 

fil from the (C t , G ; > pair; and 

jl2 repeating the heuristic algorithm starting with generating the plurality of 

,13 sorted lists, taking into account the changes in the values of the Q values and the <C Z> G ; > pairs 

r|4 as a result of the previous server assignment during each iteration, until the load from all 

{ft {C if Gj) pairs has been assigned to a server S k ; 

|B5 wherein if, during any iteration of the heuristic algorithm, the load 

17 capacity of the server S k is not sufficient to handle the remaining load, the remaining load 

18 capacity of the server S k is assigned to some of the load of the (C v G) pair, and an unassigned 

19 portion of the load from the {C p Gj) pair is reinserted into the iterative process. 
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1 29. A system as recited in claim 28, the processor further programmed for: 

2 generating a load-capacity prioritized sorted list by sorting C, values in 

3 decreasing order of remaining load capacity s k ; 

4 starting with a top-most, largest value item in the load-capacity 

5 prioritized list, identifying comparable smallest-value items from the other lists to generate a 

6 load-capacity prioritized (C lf G p S k ) triple; 

7 considering the load-capacity prioritized {C t , G Jt S k ) triple in the selection 
^8 of the top-most {C if G r S k ) triple with the smallest u ltJ x x (sdel k +ndel jtJ ) value; and 

09 repeating the heuristic algorithm starting with generating the plurality of 

gjo sorted lists, taking into account the changes in the values of the Q values, the {C lf G) pairs, and 

t|l remaining load capacity as a result of the previous server assignment during each iteration, 

2 until the load from all {C iy G) pairs has been assigned to a server S k . 

pi 1 30. A system as recited in claim 28, the processor further programmed for: 

2 generating a list of content-available (C lt S k ) pairs in which the content of 

03 customer C t is stored in server S k ; and 

~ 4 selecting the (C^G^S^ triple with the smallest u i} xc ( x {sdel k +ndel h ^ 

5 value that is also part of the list of content-available (Q, S k ) pairs; 

6 wherein if, during any iteration of the heuristic algorithm, there is no 

7 (C p G Jf S k ) triple that is also part of the list of content-available (C if S k ) pairs, a suitable (C if G p S k ) 

8 triple with the smallest u^xc^sde^+ndelj f) value is chosen and the data of customer C t is 

9 migrated to server S k . 
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1 3 1 . A system as recited in claim 30, the processor further programmed for 

2 generating a list of content-unavailable (C„ S k ) pairs in increasing order of migration time 

3 penalty for which the content of customer C ; is not stored in server S k ; 

4 wherein if, during any iteration of the heuristic algorithm, there is no 

5 (C t , Gj, S k ) triple that is also part of the list of content-available <C„ S k ) pairs, the processor is 

6 further programmed for selecting a suitable (Q, G p S k ) triple with the smallest combined 

7 u,jxc,x(sdel k +ndeljj value and (C„ S k ) migration time penalty, and migrating the data of 
C?8 customer C. to server S k . 



HI 



32. A system as recited in claim 19, the processor further programmed for 

N-2 estimating ndel Jk for non-persistent connections using HTTP logs by: 

3 computing an estimated round trip delay Ar semriCbem as t con _ req Jec (i : + 1) - 

L 4 t resp send JLi) from information stored in the HTTP logs, where t con _ req rec (i+ 1) represents a time 

O 5 at which a connection request message is received by the server for an (i + l) th object, and 

2 6 t resp send eJf) represents a time at which the server stops sending an f object; and 
y 7 computing the response time as 

8 (t , ) + 2x^*2^, where t conreqrec represents a time at which a 

° V resp _send _end con _ req rec J 2 amjeqja. 

9 connection request message is received by the server for an object, and t respjerdjnd represents a 
10 time at which the server stops sending the object. 
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1 33. A system as recited in claim 19, the processor further programmed for 

2 estimating ndel Jk for persistent connections using HTTP logs by: 

3 computing an estimated round trip delay Ar serveM as t cm _ €loseJ€C - 

4 t respjend _ end {last), where t con _ close _ rec represents a time at which the server receives a request to 

5 close the persistent connection, and t mp send er Jlast) represents a time at which the server stops 

6 sending a response for a last request; and 

7 computing the response time as 

Ar Ar 

f (t concloserec -t M _ m _„)+-Z^--Zf^, where t cmjeqjec represents a connection 

?=*9 request time. 
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ABSTRACT 

A content delivery system having m servers, S' ={S lf ...,S m }, n active 
customers, C = {C 1 ,...,C n }, and g geographic locations, G' ={G„...,G g } is disclosed, 
wherein sdel k is a server delay of server S k , ndel J>k is a network delay observed by customers in 
geographic location G } while retrieving content from server S k , Pj is a priority value for 
customer C ( , c, is a total load of customer Qi, h ;j is a fraction of requests coming to customer 
C, from region G p a hhk is a mapping representing a fraction of requests coming to customer C, 
from region G ; that have been redirected to server S k , and ^ represents a load capacity of 
server S k . Within such a system, a method for distributing server loads includes the steps of 
^ representing an average prioritized observed response time as 

] '?X3 Q ffl 

^ E _iZ _Xi,-A ai ^ XUi 'j XCiXPiX(sdelk+ndelj ' k) 

i — 1 / — 1 /t — 1 

w ^o/jr = > 

^ and then generating a mapping that assigns requests from customers to a particular server while 
1= minimizing AORT. A heuristic algorithm is used to generate the mapping, wherein large a Uhk 
% values are assigned to small w w x c, x {sdel k +ndel hk ) values to produce a smaller overall AORT 

:ij value. 

l3 
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